Deep Bayesian Neural Nets as Deep Matrix Gaussian Processes

نویسندگان

Christos Louizos

Max Welling

چکیده

We show that by employing a distribution over random matrices, the matrix variate Gaussian Gupta & Nagar (1999), for the neural network parameters we can obtain a non-parametric interpretation for the hidden units after the application of the “local reprarametrization trick” (Kingma et al., 2015). This provides a nice duality between Bayesian neural networks and deep Gaussian Processes Damianou & Lawrence (2013), a property that was also shown by Gal & Ghahramani (2015). We show that we can borrow ideas from the Gaussian Process literature so as to exploit the non-parametric properties of such a model. We empirically verified this model on a regression task. 1 MATRIX-VARIATE GAUSSIAN The matrix variate Gaussian (Gupta & Nagar, 1999) is a three parameter distribution that governs a random matrix, e.g. W: p(W) =MN (M,U,V) = exp ( − 1 2 tr [ V−1(W −M)TU−1(X−M) ]) (2π)np/2|V|n/2|U|n/2 (1) where M is a r × c matrix that is the mean of the distribution, U is a r × r matrix that provides the covariance of the rows and V is a c × c matrix that governs the covariance of the columns of the matrix. According to Gupta & Nagar (1999) this distribution is essentially a multivariate Gaussian distribution over the “flattened” matrix W: p(vec(W)) = N (vec(M),V ⊗ U), where vec(·) is the vectorization operator (i.e. stacking the columns into a single vector) and ⊗ is the Kronecker product. 2 BAYESIAN NEURAL NETS WITH MATRIX-VARIATE GAUSSIANS For the following derivation we will assume that each input to a layer is augmented with an extra dimension containing 1’s so as to account for the biases and thus we are only dealing with weights W on this expanded input. In order to obtain a matrix variate Gaussian posterior distribution for these weights we will perform variational inference and we can work in a pretty straightforward way: the derivation is similar to (Graves, 2011; Kingma & Welling, 2014; Blundell et al., 2015; Kingma et al., 2015). Let pθ(W), qφ(W) be a matrix variate Gaussian prior and posterior distribution with parameters θ, φ respectively and (xi,yi)i=1 be the training data sampled from the empirical distribution p̃(x,y). Then the following lower bound on the marginal log-likelihood can be derived: Ep̃(x,y)[log p(Y|X)] ≤ Ep̃(x,y) [ Eqφ(W) [ log p(Y|X,W) ] −KL(qφ(W)||pθ(W)) ] (2)

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Deep Neural Networks as Gaussian Processes

A deep fully-connected neural network with an i.i.d. prior over its parameters is equivalent to a Gaussian process (GP) in the limit of infinite network width. This correspondence enables exact Bayesian inference for neural networks on regression tasks by means of straightforward matrix computations. For single hiddenlayer networks, the covariance function of this GP has long been known. Recent...

متن کامل

Scalable Gaussian Process Regression Using Deep Neural Networks

We propose a scalable Gaussian process model for regression by applying a deep neural network as the feature-mapping function. We first pre-train the deep neural network with a stacked denoising auto-encoder in an unsupervised way. Then, we perform a Bayesian linear regression on the top layer of the pre-trained deep network. The resulting model, Deep-Neural-Network-based Gaussian Process (DNN-...

متن کامل

Deep Gaussian Processes for Regression using Approximate Expectation Propagation

Deep Gaussian processes (DGPs) are multi-layer hierarchical generalisations of Gaussian processes (GPs) and are formally equivalent to neural networks with multiple, infinitely wide hidden layers. DGPs are nonparametric probabilistic models and as such are arguably more flexible, have a greater capacity to generalise, and provide better calibrated uncertainty estimates than alternative deep mod...

متن کامل

Wide Deep Neural Networks

Whilst deep neural networks have shown great empirical success, there is still much work to be done to understand their theoretical properties. In this paper, we study the relationship between Gaussian processes with a recursive kernel definition and random wide fully connected feedforward networks with more than one hidden layer. We exhibit limiting procedures under which finite deep networks ...

متن کامل

Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors

We introduce a variational Bayesian neural network where the parameters are governed via a probability distribution on random matrices. Specifically, we employ a matrix variate Gaussian (Gupta & Nagar, 1999) parameter posterior distribution where we explicitly model the covariance among the input and output dimensions of each layer. Furthermore, with approximate covariance matrices we can achie...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Deep Bayesian Neural Nets as Deep Matrix Gaussian Processes

نویسندگان

چکیده

منابع مشابه

Deep Neural Networks as Gaussian Processes

Scalable Gaussian Process Regression Using Deep Neural Networks

Deep Gaussian Processes for Regression using Approximate Expectation Propagation

Wide Deep Neural Networks

Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors

عنوان ژورنال:

اشتراک گذاری